Contract & Expand: I/O Efficient SCCs Computing

نویسندگان

  • Zhiwei Zhang
  • Lu Qin
  • Jeffrey Xu Yu
چکیده

As an important branch of big data processing, big graph processing is becoming increasingly popular in recent years. Strongly connected component (SCC) computation is a fundamental graph operation on directed graphs, where an SCC is a maximal subgraph S of a directed graph G in which every pair of nodes is reachable from each other in S. By contracting each SCC into a node, a large general directed graph can be represented by a small directed acyclic graph (DAG). In the literature, there are I/O efficient semi-external algorithms to compute all SCCs of a graph G, by assuming that all nodes of a graph G can fit in the main memory. However, many real graphs are large and even the nodes cannot reside entirely in the main memory. In this paper, we study new I/O efficient external algorithms to find all SCCs for a directed graph G whose nodes cannot fit entirely in the main memory. To overcome the deficiency of the existing external graph contraction based approach that usually cannot stop in finite iterations, and the external DFS based approach that will generate a large number of random I/Os, we explore a new contraction-expansion based approach. In the graph contraction phase, instead of contracting the whole graph as the contraction based approach, we only contract the nodes of a graph, which are much more selective. The contraction phase stops when all nodes of the graph can fit in the main memory, such that the semi-external algorithm can be used in SCC computation. In the graph expansion phase, as the graph is expanded in the reverse order as it is contracted, the SCCs of all nodes in the graph are computed. Both graph contraction phase and graph expansion phase use only I/O efficient sequential scans and external sorts of nodes/edges in the graph. Our algorithm leverages the efficiency of the semi-external SCC computation algorithm and usually stops in a small number of iterations. We further optimize our approach by reducing the size of nodes and edges of the contracted graph in each iteration. We conduct extensive experimental studies using both real and synthetic webscale graphs to confirm the I/O efficiency of our approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel and distributed computing for data mining

Similar scenarios will occur in other areas: we will see large numbers of radiological images generated in hospitals and immense product and customer databases as the Internet and e-commerce continue to expand.1 Exploring useful information from such data will require efficient parallel algorithms running on high-performance computing systems with powerful parallel I/O capabilities. Without suc...

متن کامل

A Dynamic Threshold Decryption Scheme Using Bilinear Pairings

A dynamic threshold sharing scheme is one that allows the set of participants to expand and contract. In this work we discuss dynamic threshold decryption schemes using bilinear pairing. We discuss and analyze existing schemes, demonstrate an attack and construct a significantly more efficient secure scheme.

متن کامل

Efficient Approximation Algorithms for Point-set Diameter in Higher Dimensions

We study the problem of computing the diameter of a  set of $n$ points in $d$-dimensional Euclidean space for a fixed dimension $d$, and propose a new $(1+varepsilon)$-approximation algorithm with $O(n+ 1/varepsilon^{d-1})$ time and $O(n)$ space, where $0 < varepsilonleqslant 1$. We also show that the proposed algorithm can be modified to a $(1+O(varepsilon))$-approximation algorithm with $O(n+...

متن کامل

Computing PI and Hyper–Wiener Indices of Corona Product of some Graphs

Let G and H be two graphs. The corona product G o H is obtained by taking one copy of G and |V(G)| copies of H; and by joining each vertex of the i-th copy of H to the i-th vertex of G, i = 1, 2, …, |V(G)|. In this paper, we compute PI and hyper–Wiener indices of the corona product of graphs.

متن کامل

On External-Memory MST, SSSP, and Multi-way Planar Graph Separation

Recently external memory graph problems have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/O-efficient graph algorithms have been developed, a number of fundamental problems still remain open. The results in this paper fall in two main classes. First we develop an improved algorithm for th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014